Individual Project - Unit 11

Multi-Agent Email Forensics System

Overview

The Multi-Agent Email Forensics System is an intelligent agent-based solution for automated security analysis of email communications. The system employs four specialized autonomous agents working in a coordinated pipeline to discover, analyze, visualize, and report on potential security threats.

Key Capabilities

4
Specialized Agents
29
Tests (100% Passing)
8
Visualizations
6
Pipeline Stages

System Architecture

The system implements a cooperative multi-agent architecture where four specialized agents work sequentially in a pipeline:

┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐     ┌────────────────┐
│  Discovery      │────▶│  Analysis    │────▶│  Dashboard      │────▶│  Report        │
│  Agent          │     │  Agent       │     │  Agent          │     │  Agent         │
└─────────────────┘     └──────────────┘     └─────────────────┘     └────────────────┘
        │                       │                      │                       │
        ▼                       ▼                      ▼                       ▼
   Email Files            Findings List         Visualizations          Reports

1. DiscoveryAgent

Role: Autonomous data acquisition

Locates email files and parses them into structured objects

Pattern: Repository

2. AnalysisAgent

Role: Threat detection

Applies 4 parallel detection strategies

Pattern: Strategy

3. DashboardAgent

Role: Visual analytics

Creates 8 visualization types

Pattern: Factory

4. ReportAgent

Role: Report generation

Produces text and HTML reports

Pattern: Template Method

Analysis Results & Visualizations

Summary Statistics

Summary Statistics Chart
Email Forensics Summary Statistics - Total emails, suspicious emails, external communications, and findings

Key Visualizations

Activity Heatmap
Activity Heatmap - Email patterns by hour and day
Severity Distribution
Severity Distribution - Classification of findings
Word Cloud
Word Cloud - Common terms in suspicious emails

Six-Stage Pipeline

Stage 1: Data Generation

EnhancedEmailGenerator creates 50 test emails (30% suspicious, 70% normal) with realistic subjects, timestamps, and sender domains.

Output: Email files in output/emails/

Stage 2: Discovery

DiscoveryAgent autonomously locates and loads emails, parsing structure (ID, Subject, From, To, Date, Content) into SimpleEmail objects.

Output: List of SimpleEmail objects

Stage 3: Analysis

AnalysisAgent applies four parallel detection strategies:

  1. Keyword Analysis: Scans for 21 suspicious terms (urgent, confidential, bitcoin, phishing, etc.)
  2. Temporal Analysis: Flags after-hours emails (outside 8 AM - 6 PM)
  3. Source Analysis: Identifies external domain communications
  4. Volume Analysis: Detects anomalous sending patterns

Output: List of Finding objects with severity classification (High/Medium/Low)

Stage 4: Visualization

DashboardAgent generates 8 visualizations using matplotlib, seaborn, and wordcloud:

  1. Summary statistics chart
  2. Email distribution pie chart
  3. Hourly activity histogram
  4. Subject word cloud
  5. Activity timeline
  6. Day-hour heatmap
  7. Network analysis
  8. Severity distribution

Output: PNG files in output/visualizations/

Stage 5: Reporting

ReportAgent consolidates results using Jinja2 templates into text and HTML formats with embedded visualizations.

Output: forensics_report.html and forensics_report.txt in output/reports/

Stage 6: UML Documentation

Auto-generates PlantUML class and sequence diagrams documenting system architecture.

Output: Documentation in output/uml_documentation/

Technical Implementation

Data Models

SimpleEmail: Email dataclass with fields id, subject, sender, recipient, date, content, file_path

Methods:

Finding: Investigation finding with finding_type, description, email_id, severity, timestamp

Dependencies

Testing & Validation

29 tests, 100% passing:

Personal Reflection

Building this multi-agent system from scratch revealed the practical value of autonomous, specialized agents working in coordinated pipelines. The separation of concerns between discovery, analysis, visualization, and reporting made the system easier to develop, test, and maintain.

Implementing four detection strategies taught me that effective security analysis requires multiple perspectives. Keyword detection alone misses after-hours anomalies, while temporal analysis misses sophisticated phishing from normal hours. The comprehensive testing (29 tests, 100% passing) caught integration bugs early, particularly in data flow between agents.

This project connected theoretical concepts about agent architectures (Wooldridge, 2009) with practical concerns like error handling and user experience. Generating professional visualizations taught me that intelligent systems must present results in accessible formats for non-technical stakeholders.

Source Artifacts | 📝 README | 💻 agent.py | 💻 main.py | 💻 utils.py | 📊 Presentation
← Back to Intelligent Agents Portfolio